干货|深度学习项目流程(DLSS2016上Andrew Ng讲座摘要)(附Github)
全球人工智能
来源:GitHub
深度学习项目工作流程
This document attempts to summarize Andrew Ng's recommended machine learning workflow from his"Nuts and Bolts of Applying Deep Learning"talk at Deep Learning Summer School 2016. Any errors or misinterpretations are my own.
从这里开始
Measure Human-level performance on your task.
Do your training and test data come from the same distribution?
Yes
No
测量人的水平表现
The real goal of measuring human-level performance is to estimate theBayes Error Rate. Knowing your Bayes Error Rate helps you figure out if your model is underfitting or overfitting your training data. More specifically, it will let us measure 'Bias' (as Ng defines it), which we use later in the workflow.
如果你的训练和测试数据来自于相同的分布
1. 把你的数据整理和划分成Train / Dev / Test 集
Ng recommends a Train / Dev / Test split of approximately 70% / 15% / 15%.
2. 衡量你的训练误差和开发设定误差,并计算拜厄斯和方差
Calculate your bias and variance as:
Bias = (Training Set Error) - (Human Error)
Variance = (Dev Set Error) - (Training Set Error)
3. 出现High Bias了吗?首先修复它
An example of high bias:
Fix high biasbefore going on to the next step.
4. 方差很高吗?修复它。
An example of high variance:
Once youFix Your High Variancethen you're done!
如果你的训练和测试数据不是来自于相同的分布
1. 划分你的数据
If your train and test data come from different distributions, make sure at least your dev and test sets are from the same distribution. You can do this by taking your test set and using half as dev and half as test.
Carve out a small portion of your training set (call thisTrain-Dev) and split your Test data intoDevandTest:
2. 测量您的误差,并计算相关指标
Calculate these metrics to help know where to focus your efforts:
3. 有高的Bias?修复它!
An example of high bias:
4.方差很高吗?修复它。
An example of high variance:
Fix your high variancebefore going on to the next step.
4.你的训练或测试不匹配吗?修复它
An example of train/test mismatch:
Fix Your Train/Test Mismatchbefore going on to the next step.
5. 你的Dev Set出现过度拟合吗?修复它
An example of overfitting your dev set:
Once youfix your dev set overfitting, you're done!
如何修复高Bias
Ng suggests these ways for fixing a model with high bias:
Try a bigger model
Try training longer
Try a new model architecture (this can be hard)
如何修复高方差
Ng suggests these ways for fixing a model with high variance:
Get more data
This includes data synthesis and data augmentation
Try adding regularization
Try early stopping
Try new model architecture (this can be hard)
训练和测试失配,如何调整
Ng suggests these ways for fixing a model with high train/test mismatch:
Try to get more data similar to your test data
Try data synthesis and data augmentation
Try new model architecture (this can be hard)
如何解决你Dev Set的过度拟合
Ng suggests only one way of fixing dev set overfitting:
Get more dev data
Presumably this would include data synthesis and data augmentation as well.
资源:https://github.com/thomasj02/DeepLearningProjectWorkflow